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Q Abstract — This paper studies decentralized, Fountain and 
networlf-coding based strategies for facilitating data collection in 
circular wireless sensor networks, which rely on the stochastic 
diversity of data storage. The goal is to allow for a reduced 
delay collection by a data collector who accesses the network 
at a random position and random time. Data dissemination is 
performed by a set of relays which form a circular route to ex- 
change source packets. The storage nodes within the transmission 
range of the route's relays linearly combine and store overheard 
relay transmissions using random decentralized strategies. An 
intelligent data collector first collects a minimum set of coded 
packets from a subset of storage nodes in its proximity, which 
might be sufficient for recovering the original packets and, by 
using a message-passing decoder, attempts recovering all original 
source packets from this set. Whenever the decoder stalls, the 
source packet which restarts decoding is polled/doped from its 
original source node. The random-walk-based analysis of the 
decoding/doping process furnishes the collection delay analysis 
with a prediction on the number of required doped packets. 
The number of doped packets can be surprisingly small when 
employed with an Ideal Soliton code degree distribution and, 
hence, the doping strategy may have the least collection delay 
when the density of source nodes is sufficiently large. Further- 
more, we demonstrate that network coding makes dissemination 
more efficient at the expense of a larger collection delay. Not 
surprisingly, a circular network allows for a significantly more 
(analytically and otherwise) tractable strategies relative to a 
network whose model is a random geometric graph. 

Keywords: decentralized Fountain codes, wireless net- 
works, network coding, distributed storage, data collection 

I. Introduction 

Wireless sensor networks (WSN) monitor and collect sensor 
data distributed over large physical areas. Sensor nodes are 
simple, battery-run devices with limited data processing, stor- 
age, and transmission capabilities. For energy efficiency rea- 
sons, the main data propagation model is hop-by-hop, where 
nodes relay other nodes' data. An important collection scenario 
is when a data sink (a collector) appears at a random position, 
at random time, and aims to collect all the k source data 
packets. The network's goal is to ensure that the data packets 
be efficiently disseminated and stored in a manner which 
allows for a low collection delay upon collector's arrival. This 
is achieved by storing data in a compact collection area at 
the fingertips of the data collector, i.e., at a set of connected 
(through multiple hops) wireless nodes in its proximity. There 
is a fundamental tradeoff between network storage capacity 
and the collection delay. If each node across the network can 
store all the packets, the data can be collected in a single hop. 
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On the other extreme, if each node has own data destined 
for the collector and no capacity to store other packets, the 
collector has to reach out to all the network nodes to collect 
the data, which would incur an extreme delay. The canonical 
model considered here is when the number of source nodes k 
is smaller than the number of network nodes and where each 
network node can both relay and store one packet. 

Storing data at the fingertips of the randomly positioned 
collector implies redundant data storage across the network, 
whether it means simply storing source packet replicas, or 
random linear combinations thereof, and resulting respectively 
in a repetition code, or other linear code, implemented across 
a network. More efficient storage codes than the simplest 
repetition code require that the collector not only collects the 
linear combinations but also is capable of decoding/recovering 
the original packets. The two general classes of packet 
combining (coding) techniques in HI, ||2l, IfTOll . lfT2ll . and 
fl3l are: Fountain-type erasure codes 13, flOl, |[T2l, |fT3]| . 
and decentralized erasure codes HI - a variant of random 
linear network codes IS), ||6l. The advantage of Fountain 
type coding is in the linear complexity of decoding which, 
here, corresponds to linear original packet recovery time. The 
key difficulty of the Fountain type storage approach is in 
devising efficient techniques to disseminate data from multiple 
sources to network storage nodes in a manner which ensures 
that the required statistics of created linear combinations 
is accomplished. Achieving this goal is particularly difficult 
when employed with the classic random geographic graph 
network models ifTH. |fT3]|. 

In this paper we analyze decentraUzed Fountain-type net- 
work coding strategies for facilitating a reduced delay data 
collection and network coding schemes for efficient data 
dissemination for a planar donut-shaped sensor network (see 
Fig. [T]i whose nodes lie between two concentric circles. The 
network backbone is a circular route of relay nodes which 
disseminate data. All network nodes within its transmission 
range overhear relay's transmissions and serve as potential 
storage nodes. The storage nodes within a relay's transmission 
range form a squad. The squad size determines the relay's 
one-hop storage capacity. Squad's storage capacity together 
with the source node density and the coding/collection strategy 
determine the data collection delay measured in terms of the 
number of communication hops required for the collector to 
collect and recover all k source data packets. In the proposed 
polling (packet doping) scheme, an intelligent data collector 
(IDC) first collects a minimum set of coded packets from a 
subset of storage squads in its proximity (as in Fig. [T]), which 
might be sufficient for recovering the original packets and, by 
using a message-passing decoder, attempts recovering all orig- 
inal source packets from this set. Whenever the decoder stalls, 
the source packet which restarts decoding is polled/doped 
from its original source node (at an increased delay since 
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this packet is likely not to be close to the collector). The 
random-walk-based analysis of the decoding/doping process 
represents the key contribution of this paper. It furnishes the 
collection delay analysis with a prediction on the number 
of required doped packets. The number of required packet 
dopings is surprisingly small and, hence, to reduce the number 
of collection hops required to recover the source data, one 
should employ the doping collection scheme. The delay gain 
due to doping is more significant when the relay squad storage 
capacity is smaller. Furthermore, employing network coding 
makes dissemination more efficient at the expense of a larger 
collection delay. 

Not surprisingly, a circular network allows for a signifi- 
cantly more (analytically) tractable strategy relative to a net- 
work whose model is a random geometric graph (RGG) lfT2l . 
ifTJl . Besides, the RGG modeling implies that a packet is 
forwarded to one of the neighbors in the network graph, while 
the fact that all the neighbors are overhearing the same trans- 
mission is not considered. In contrast, the proposed approach 
is aiming to incorporate the wireless multicast advantage ifTSl 
into the dissemination/storage model. In our earlier work ifTTI . 
we show how a randomly deployed network can self-organize 
into concentric donut-shaped networks. Note that the proposed 
topology is an especially good model for sensor networks 
deployed to monitor physical phenomena with linear spatial 
blueprint, such as road networks, vehicular networks, or border 
and pipeUne-security sensor nets J!], Q. 

II. System Model and Problem Formulation 

We consider an inaccessible static wireless sensor network 
(e.g., a disaster recovery network) with network nodes that 
are capable of sensing, relaying, and storing data. Nodes are 
randomly scattered in a plane according to a Poisson point 
process of some intensity fi. The nodes have constrained 
memory resources. Without loss of generality, we assume that 
most nodes have a unit-size buffer. Each node that senses an 
event creates a unit-size description data packet. We refer to 
such a node as a data source. We assume that events are 
distributed as a Poisson point process of intensity /is < fi. 
We define the transmission range as the maximum distance 
r from the transmitter at which nodes can reliably receive 
a packet. Assuming radially symmetric attenuation (isotropic 
propagation), the transmitted packet is reliably received in a 
disk of area r'^n, illustrated in Figure |2l The expected number 
of network nodes in the disk is fir'^ir and the expected number 
of source nodes is Hs'i"'^'^- 

Within the sensor network, we consider a circular route, 
composed of k nodes referred to as relays. The distance 
between adjacent relays is equal to the transmission range r. 
Without a loss of generality and for simplicity, we assume 
that r is selected so that only a single data source node is 
(expected to be) within the transmission range of a relay. That 
is, /isT^TT = 1. Source node observes an event and sends 
its data packet to the relay, making it a virtual source (see 
Figure |2]i. Thus, k relays form a linear network (route) with 
data packet i assigned to relay i, i G [1, . . . , k] . Each sensor 
node within the range of a relay is associated with the route 



via a one-hop connection to a relay. We refer to the set of 
nodes within the range of a relay as a squad, and to a node as 
squad-node. Squad nodes can hear transmissions either from 
only relay i or from, also relay i + 1 and, thus, belong to 
either the own set of squad-nodes Oi, or to the shared set of 
squad-nodes, denoted S'i(i+i), where, hereafter, any addition 
operation will be assumed to be mod k, i.e., {i + 1) mod k, 
as shown in Figure ID By means of associations, the relays 
in the circular route together with the squad nodes form a 
donut-shaped circular squad network. The expected number 
of nodes in the squad is denoted with h = /ir^Tr, while the 
expected area of each shared set is E ['S'i(i+i)] = h = OAh. 
We primarily focus on shared squad nodes. In the rest of the 
paper, whenever we refer to squad-nodes, we mean shared 
nodes, and for simplicity we assume h ~ h. The goal is to 
disseminate data from all sources and store them at squad 
nodes so that a collector can recover all k original packets 
with minimum delay. An IDC collects data via a collection 
relay. The data is collected from kxikd) = kg + kd storage 
nodes of which most {kg >> kd) reside in a set of s adjacent 
squads, including the collection relay squad. These s squads 
form a supersquad (See Figure [T]!. The number of packets not 
collected from the supersquad is denoted kd- 

Note that the density of sources ps is dependent on the 
spatial characteristics of the monitored physical process, i.e., 
the spatial density of events. A well designed sensor network 
will ensure that the spatial density of nodes ^ is designed 
to properly cover this process. When r is selected to ensure 
r^TT^s = 1 then h = fi/ fig is the coverage redundancy factor 
Furthermore, for a given received signal-to-noise ratio, the 
one-hop transmission energy Ei and the single hop delay ti 
are inversely proportional to ps . And, for a given circular route 
radius R, the (expected) number of relays is k ~ R/r. Hence, 
for a given transmission range r (or /i^), the only degree of 
freedom is the coverage redundancy factor h (squad size), 
i.e, the network density p. By reducing fi, we decrease the 
average number of nodes in a squad h. This has implications 
to the collection (delay and energy) cost. The supersquad 
consists of s = Ikg/h] squads, and the average number 
of hops a packet makes until it is collected by the IDC is 
(s — l)/4 + l. Hence, the smaller the ^, the larger the average 
collection delay Ts = ks{{s — l)/4 + 1)ti and the energy 
Eg = kg {{s — l)/4 + 1) £^1 from the supersquad. Henceforth, 
we will, without loss of generality, normalize ti = 1 and 
El = 1. The key collection performance measure will be the 
average number of collection hops per source packet c, where 
c = kg [1 + {s — l)/4] /k when all collected packets are from 
the supersquad, i.e., kxiO) = kg. 

We will comparatively consider two classes of stor- 
age/encoding strategies: in the first, the IDC collects the 
original packets, while in the second one the collected packets 
are linear combinations of the original packets and, hence, 
the IDC needs to decode them to recover source packets. 
When combining is employed, constrained by the collection 
delay, we consider only storage strategies which allow for 
decoding methods of linear complexity, i.e., the use of belief 
propagation (BP) iterative decoders. Taking as a reference the 
case where original packets are encoded into coded packets 
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whose linear combination degrees follow the Robust Soliton 
distribution, as in [fT3|, based on the asymptotic analysis of LT 
codes [il4il . we expect that kxiO) = ks = k + \/klog^{k/e) 
collected code symbols are required to decode (l~e)fc original 
symbols, where e is a sufficiently small constant. Here the 
number of collected packets is significantly larger than k for 
small to medium number of sources k. Hence, collection of 
this many packets can be expensive, in particular when the 
event coverage redundancy factor h is small. Collecting a 
smaller number of packets upfront would result in a stalled 
decoding process. Here, we take advantage of the availability 
of additional replicas of source packets along the circular 
network, to pull one such packet off the network in order to 
continue the stalled decoding process. See Figure [T] The pull 
phase is meant to assist the decoding process using a technique 
that we refer to as doping. In the following, encoding describes 
the mapping on the source packets employed both while 
disseminating and while storing. It is a mapping from the 
original k packets to the collected kxikd) = ks + kd encoded 
packets. 

III. Data Dissemination 

The nodes within the transmission range of the route relays 
together with the relays themselves form a dissemination 
network. The dissemination connectivity graph is a simple 
circular graph with k nodes. This graph models connections 
between relays, which are bidirectional. The connectivity 
graph used in the storage model is expanded with storage 
nodes, representing shared squad nodes. In this graph, every 
storage node is adjacent to two neighboring relay nodes. 
Also, edges between storage and relay nodes are directed, 
as illustrated in Figure [3] Every edge in the dissemination 
graph is of unit capacity. A single transmission reaches two 
neighboring relays. 

We consider two dissemination methods: no combining in 
which each relay sends its own packet and forwards each 
received packet until it has seen all k network packets, and 
degree-two combining, described in Figure |5] For the degree- 
two combining dissemination, a relay node combines the 
packet received from its left with the packet received from 
its right into a single packet by XOR-ing respective bits, 
to provide innovative information to both neighboring relays 
for the cost of one transmission |fT9| . Consequently, each 
relay performs a total of \{k — l)/2] first-hop exchanges, as 
described in Figure |4] and in iflTl . Note that here storage 
nodes overhear degree-two packet transmissions. They either 
randomly combine those with previously received degree-two 
packets, or they first apply the on-line decoding of the packets 
(see Figure [Hi, and then combine obtained degree-one pack- 
ets with previously stored linear combinations of degree-one 
packets. For further details about degree-two dissemination, 
the reader is referred to ifTol . 

IV. Decentralized Squad-Based Storage Encoding 

Under a centralized storage mechanism that would allow 
coordination between squad nodes, a unique packet could be 
assigned to each of k nodes located within a supersquad of an 



approximate size kjh, and the same procedure repeated around 
the circular network for each set of k adjacent squad nodes. 
This periodic encoding procedure would allow a randomly 
positioned IDC to collect k original packets from the set 
of closest nodes. However, our focus are scalable designs 
where centralized solutions are not possible. We resort to 
stochastic protocols for storing packet replicas, and apply 
random coding to store linear combinations of the packets. 
For each dissemination method we distinguish: combining and 
non-combining decentralized storage techniques. In both we 
assume that the storage squad nodes can hear (receive) any of 
the k dissemination transmissions from the neighboring relay 
nodes. Hence, either a common timing clock or/and regular 
transmission listening is necessary. The reference example 
of non-combining (non-coding) methods is coupon collection 
storage, in which each squad node randomly selects one of 
k packets to store ahead of time. As the coupon collector 
is completely random, it requires on average k log k storage 
nodes to cover all the original packets. In order to decrease 
the probability of many packets not being covered, we apply 
combining storage techniques in which one storage node's 
encoded packet contains information that covers many original 
packets. The higher this code symbol degree is, the lower is 
the likelihood that a packet will stay uncovered. We consider 
combining either degree-two or degree-one packets. Each 
squad node samples a desired code symbol degree d from 
distribution uj{d), d G [I,-- - ,k]. The squad node decides 
ahead of time which subset of d transmissions it will combine 
to generate the stored encoded packet. Choosing a good 
distribution uj{d) is not easy, since it needs to satisfy many 
contradicting requirements. The high-degree code symbols are 
good for decreasing the probability of uncovered packets. 
However, other requirements are more important for proper 
behavior of the BP decoding process, especially the right 
amount of degree one and degree two code symbols. It is 
well known that Ideal Soliton's (IS) expected behavior is close 
to ideal for Fountain codes decoded by a BP decoder, but 
the large variance may cause a frequent absence of degree- 
one symbols (the ripple) in the collected sample of code 
symbols, thus stalling the BP process. This is the reason why 
Robust Soliton (RS) is used as a choice degree distribution for 
rateless erasure codes. For RS, the probability of one-degree 
symbols is overdesigned in order to prevent stalling. However, 
redistribution of the probability mass from higher degrees to 
degree-one increases the likelihood of uncovered packets. In 
the next section, we present an analysis of why IS turns out 
to be better than RS when BP doping is used. 

V. Collection and Decoding 

The collection problem with the coupon collector (and with 
similar non-combining storage methods) is straightforward 
as it excludes decoding. The focus is simply on providing 
coverage redundancy h that minimizes the size of supersquad 
containing k log k packets required to recover k source pack- 
ets. For the Fountain-based combining methods, the collection 
problem is more elaborate, and intricately tied to decoding 
strategy, which we study in the following subsections. 
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A. Belief Propagation Decoding 

Suppose that we have a set of kg code symbols that are 
linear combinations of k unique input symbols, indexed by 
the set {1, • • • , fc}. Let the degrees of linear combinations be 
random numbers that follow distribution uj{d) with support 
d G {!,••• : k}. Here, we equivalently use ijj{d) and its gen- 
erating polynomial D,{x) = X]d=i^<i^'^' where Qd = uj{d). 
Let us denote the graph describing the (BP) decoding process 
at time t by Gt (see Figure|6l). We start with a decoding matrix 
So = [^ij]kxk ' where code symbols are described using 
columns, so that = 1 iff the jth code symbol contains the 
ith input symbol. Number of ones in the column corresponds 
to the degree of the associated code symbol. Input symbols 
covered by the code symbols with degree one constitute the 
ripple. In the first step of the decoding process, one input 
symbol in the ripple is processed by being removed from 
all neighboring code symbols in the associated graph Go. If 
the index of the input symbol is m, this effectively removes 
the TOth row of the matrix, thus creating the new decoding 
matrix Si = [sij](fc_i)xfc ■ refer to the code symbols 
modified by the removal of the processed input symbol as 
output symbols. Output symbols of degree one may cover 
additional input symbols and thus modify the ripple. Hence, 
the distribution of output symbol degrees changes to ^li{x). 
At each subsequent step of the decoding process one input 
symbol in the ripple is processed by being removed from all 
neighboring output symbols and all such output symbols that 
subsequently have exactly one remaining neighbor are released 
to cover that neighbor. Consequently, the support of the output 
symbol degrees after i input symbols have been processed 
is d £ {!,■■■ ,k — £}, and the resulting output degree 
distribution is denoted by ^e{x). Our analysis of the presented 
BP decoding process is based on the assumption that the ripple 
size relative to the number of higher degree symbols is small 
enough throughout the process. Consequently, we can ignore 
the presence of defected ripple symbols (redundant degree- 
one symbols) l?). Hence, the number of decoded symbols is 
increased by one with each processed ripple symbol. Now, 
let us assume that input symbols to be processed are not 
taken from the ripple, but instead provided to the decoder as 
side information. We refer to this mechanism of processing 
input symbols obtained as side information as doping. In 
particular, to unlock the belief propagation process stalled at 
time (iteration) t, the degree-two doping strategy selects the 
doping symbol from the set of input symbols connected to 
the degree-two output symbols in graph Gt, as illustrated in 
Figure |6] Hence, the ripple evolution is affected in a different 
manner, i.e. with doping-enhanced decoding process the ripple 
size does not necessarily decrease by one with each processed 
input symbol. 

The following subsections study the behavior of both vari- 
eties of the BP decoding process, first through the evolution of 
symbol degrees higher than one, and in particular by demon- 
strating the ergodicity of the Ideal Soliton degree distribution, 
then by modeling and analyzing the ripple process, resulting 
in an unified model for both classical and doping-enhanced 
decoding. Based on that model, we analyze the collection cost 



of the presented decoding strategies, when the starting uj{d) 
is Ideal Soliton. 

B. Symbol Degree Evolution 

In this subsection, we focus on the evolution of symbol 
degrees higher than one (unreleased symbols), and then an- 
alyze ripple evolution separately in the next subsection. The 
analysis of the evolution of unreleased output symbols is the 
same for both classical BP decoding case (without doping), 
and the doped BP decoding. We now present the model of 
the doping (decoding) process through the column degree 
distribution at each decoding/doping round. We model the 
dth step of the decoding/doping process by selecting a row 
uniformly at random from the set of {k~£) rows in the current 
decoding matrix Sf = [^ij]{k-e)xk ' ^^'^ removing it from the 
matrix. After £ rounds or, equivalently, when there are k — £ 
rows in the decoding matrix, the number of ones in a column 
is denoted by Ak-e- The probability that the column is of 
degree d, when its length is fc — £ — 1, G {1, • • • , fc — 3}, is 
described iteratively 

P{Ak-i-i^d) = P {Ak-i ^ d) (^1 - 

+ p{Ak-e = d+l)^^ (1) 

for 2 < d < k ~ £, and P {Ak-e-i = fc - ^) = 0. Let the 
starting distribution of the column degrees (for the decoding 
matrix Sq = [^ijlfexfe ■* Weal Soliton, denoted by p{d), 

P{d) = ford = 2,... (2) 

and p[l) = ^. By construction, for I = 0, P {Ai^ = d) = p{d), 
which, together with ([T]), completely defines the dynamics 
of the doping process when the Fountain code is based on 
the Ideal Sohton. After rearanging and canceling appropriate 
terms, we obtain, for d> 2, 

i^p{d) d = 2,--- ,k^l, 
P (Ak-i ^d) = I '^^ ^ ' ' ' (3) 

^ ' [0 d>k-£. 

We assume that kg ~ k as, by design, we desire to have the 
set of upfront collected symbols ks as small as the set of 
source symbols. The probability of degree-d symbols among 
unreleased symbols = kg — £ can be approximated 

^j^j^ P(A.-^=d)fc. _ P(A.-,=d)fc ^ ^^^^^ ^j^g probability 

distribution of the unreleased output node degrees at 

any time £ remains the Ideal Soliton 

uJiid) ^ -^P (Ak-e = rf) = pid) for d = 2, • • • , fc - £. 



C. Doped Ripple Evolution: Random Walk Model 

There exist comprehensive and thorough analytical models 
for the ripple evolution, characterizing the decoding of LT 
codes lUl, ifTsl . However, their comprehensive nature results 
in difficult to evaluate complex models. For describing the 
dynamics of a doped decoder, we consider a simpler model. 
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which attempts to capture the ripple evolution for the Ideal 
Soliton. Figure |7] and the code symbol degree evolution 
analysis illustrate how the Ideal Soliton distribution main- 
tains its shape with decoding/doping. This fact, which results 
in a tractable ripple analysis and, more importantly, in an 
outstanding performance as illustrated in the last section, is 
our main motivator for selecting Ideal Soliton Fountain codes 
for our doping scheme. We study the number of symbols 
decoded between two dopings and, consequently, characterize 
the sequence of interdoping yields. The time at which the ith 
doping occurs (or, equivalently, the decoding stalls for the 
ith time) is a random variable Ti, and so is the interdoping 
yield Yi = Ti — Ti-i. Our goal is to obtain the expected 
number of times the doping will occur by studying the ripple 
evolution. This goal is closely related to (a generalization of) 
the traditional studies of the fountain code decoding which 
attempt to determine the number of collected symbols ks 
required for the decoding to be achieved without a single 
doping iteration, i.e., when Ti > k. 

Let the number of upfront collected coded symbols be kg = 
k {1 + S) , where S is a small positive value. At time £ the total 
number of decoded and doped symbols is £, and the number 
of (unre leased) output symbols is n = fcs — £ = (fc — £) . 
Here, A| = 1 + -j^S is an increasing function of £. The 
unreleased output symbol degree distribution polynomial at 
time £ is ile{x) = J^^d^x"^, where d = 2, ■ ■ ■ , k — £, 
and fid J = i^i{d). In order to describe the ripple process 
evolution, in the following we first characterize the ripple 
increment when £ corresponds to the decoding and, next, when 
it corresponds to a doping iteration. 

Each decoding iteration processes a random symbol of 
degree-one from the ripple. Since the encoded symbols are 
constructed by independently combining random input sym- 
bols, we can assume that the input symbol covered by the 
degree-one symbol is selected uniformly at random from the 
set of undecoded symbols. Released output symbols are its 
coded symbol neighbors whose output degree is two. Releas- 
ing output symbols by processing a ripple symbol corresponds 
to performing, in average, 112 = nVl2j independent Bernoulli 
experiments with probability of success p2 = 2/(fc — £). 
Hence, the number of released symbols at any decoding 
step £ is modeled by a discrete random variable A^*' with 
Binomial distribution B (nri2.f, 2/(fc — £)) , which for large 
n can be approximated with a (truncated) Poisson distribution 
of intensity 2Vl2j\ 



^2,1 = /o(2) = 1/2, for any decoding iteration £. Hence, 
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, r = 0, • • • , 712, 



where we have first applied the Stirling approximation to the 
Binomial coefficient and, also, assumed that the probabilities 
in ^ can be neglected unless 71,2 is much larger than r. 
According to (|4]i, the fraction of degree-two output symbols 
for Ideal SoUton based Fountain code is expected to be ri2 /n w 
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or, equivalently, A^''"' ~ p (a^'^'' ) , where p(-) denotes Pois- 
son distribution. For each decoding iteration, one symbol is 
taken from the ripple and Af'' symbols are added, so that the 
increments of the ripple process can be described by random 



variables Xp 



1 with the probability distribution 
7]{r + 1) (for Xi = r) characterized by the generating 
polynomial I{x) = X]d=o '?('^)*'^''^^ ^"'^ ™ expected value 
Xf ^ -I. Next we describe the ripple increment for the doping 
iteration, where a carefully selected input symbol is revealed 
at time Ti = ti when the ripple is empty (random degree-two 
doping). The number of degree-two output symbols at time 
Ti = ti is 712 = p{'2-)n = 71/2, where, n = x\^^ (k — ti) . 
Degree-two doping selects uniformly at random a row in the 
decoding matrix St; that has one or more non-zero elements 
in columns of degree two. This is equivalent to randomly 
selecting a column of degree two to be released, and restarting 
the ripple (i.e., same as decoding) with any of its two input 
symbols from the decoding matrix whose number of degree- 
two columns is now 712 — 1 ~ 77,2, for large 772. Hence, the 
doping ripple increment can be described by unit increase 
in addition to an increase equivalent to the one obtained 
through decoding but without the ripple decrement of 1. That 
is, statistically, the doping ripple increment Xf^ is a random 
variable described by /^(x) = ^ i'i{d)x'^^^ , corresponding to 
the shifted distribution 7y(r — 1) for X^ = r. 

Now if, for the doping instant t = t^-i, we define Xt-_-^ ~ 
X[^_^ — 2, the ripple size for t € [ti-i, ti] can be described 
in a unified manner with St.i + 2 where 



X, 



(7) 
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is a random walk modeling the ripple evolution. Note that the 
ripple increments Xp are not IID random variables, since the 
intensity of 7]{d) changes with each iteration £. However, for 
analytical tractability, we study the interdoping time using the 
random walk model in (|7]i, by assuming that A^*' changes from 
doping to doping, but remains constant within the interdoping 
interval. Under this assumption, the ripple size St^i + 2 is 
a partial sum of IID random variables Xj, of the expected 
value a|^^^ — 1. Note that, when (5 = 0, i.e. when kg = fc, 
St,i is a zero mean random walk. In this special case, we 
treat the doping-enhanced BP process as (an approximate) 
renewal process, where the process starts all over after each 
doping. Modeling and analyzing this particular case is much 
easier, resulting in a closed-form expression for the expected 
number of dopings. We later refer to this case to provide some 
intuition. The expected interdoping yield is the expected time it 
takes for the ripple random walk S'(^j;+2 to become zero. Using 
random walk terminology, we are interested in the statistics 
of the random-walk stopping time. The stopping time is the 
time at which the decoding process stalls, counting from the 
previous doping time, where the first decoding round starts 
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with the 0th doping which occurs at Tq 
stopping time (doping) Ti is defined as 



0. Hence, the i-th 



T, 



mill {min {t, : St,i + 2 < 0} , fc} 



(8) 



We study the Markov Chain model of the random walk St.i- 
Each possible value of the random walk represents a state of 
the Markov Chain (MC) described by the probability transition 
matrix P^. State v,v E {I, ■ ■ ■ , k} corresponds to the ripple of 
size V— 1. State 1 is the trapping state, with the (auto)transition 
probability Pi ii = 1 and models the stopped random walk. 
Hence, based on (|6]l, we have the state transition probabilities 

P,4i = 1 (9) 

Pi,v{v+b) 

V = 2, - ■ ■ ,k, b ^ -1, • • • , min ([|] , fc - w), 

and Pi^vw = otherwise, resulting in a transition probability 
matrix of the following almost Toeplitz form 



■ 1 

m va) vm 

r/(0) 77(1) 







^(0) 77(1). 



(10) 



kxk 



with 77(-) = p (^tf ) ■ decoding process is 

modeled by the MC being in the initial state v = 3 (equivalent 
to the ripple of size two). Based on that, the probability of 
being in the trapping state, while at time t > Ti, is 



Pt ' 



[0 1 0---0]Pf [1 0---0] 



(11) 



Hence, the probability of entering the trapping state at time t 
is 



_ (Ti) _ (T.) 

= [0 1 • • • 0] f P," - P 



(12) 
[1 0---0]^, 



where u = t — Ti. {Ti} is a sequence of stopping-time 
random variables where index i identifies a doping round. 
Yi = Ti — Ti_i, i > 1 is a stopping time interval of a random 
walk of (truncated) Poisson IID random variables of intensity 



(<5) 



and can be evaluated using the following 



\ 

recursive probability expression 

Pr{y, =0} = Pr{K, = 1} = (13) 

Pr{y, = i + 1} = 77(0)i?" {t) 1 < t < fc, 

t-i 

i=l 

obtained from (fT2] l after a series of matrix transformations. 
Here, 77(0) is Poisson pdf of intensity A^^_^ evaluated at 0, 
and is the s-tupple convolution of rj{-) evaluated at d, 

resulting in a Poisson pdf of intensity sXi^^__^ evaluated at d. 
The complete derivation of (fTsT l is given in the Appendix. Note 
that the intensity sA^'^ is, in general, a random variable and 
that the sequence of doping times Ti is a Markov chain. Hence, 
the number of decoded symbols after hth doping, a partial sum 



D/i = J2i=i °f interdoping yields, is a Markov-modulated 
random walk. 

The expected number of dopings sufficient for complete 
decoding is the stopping time of the random walk Dh, where 
the stopping threshold is k — u^j^. Here, based on the coupon 
collection model, u^. is the expected number of uncovered 
symbols (which, necessarily, have to be doped) when ks coded 
symbols are collected 

[fc(l+A-)log k] 



= k 1- 



I) 



^.e-(i+'5)iogfc. 



(14) 



The total number od dopings is the stopping time random 
variable D defined as 



D 



im{h : Dh + ul > k} 



(15) 



Our model can further be simplified by replacing Ti^i with 
k = J^iZl E [Yt\Tt-i =lt] in the intensity A^^'_^ O and 
thus allowing for a direct recursive computation in ( flj] ). 
Hence, 



E[Y,\T,^i=k] 



E 



(16) 



1 - 



k-h 

E 



PT{Y,^t} {k~k). 



Furthermore, we can approximate Dh 



1 — 



"^i^i E \Yi\Ti^i ^ li] and use an algorithm in Figure [16] 
(based on ( fTsl )) to calculate expected number of dopings. 

In special case when (5 = 0, further simplifying assumptions 
lead to the approximation that all interdoping yields are 
described by a single random variable Y whose pdf is given 
by the following recursive expression, based on ( fT3] l. 



Pr{y = i + 1} = 

77(0) (pW(i-l)-|]Pr{t-z}p«(l + z) 



(17) 



where p(^)((i) denotes Poisson distribution of intensity s, 
evaluated at d, and t G [0, k — \\. The range of t varies from 
doping to doping, i.e. if Ti_i = li, then Yi would have support 
t G [Z;, fc — 1], and, hence, this single variable approximation 
is accurate for the case when both the ripple size is small and 
when ^ fc. We now approximate the expected value of the 
interdoping yield Y as 

k / k \ 

[y] w ^ tPr {r = - I 1 - ^ Pr {y = ) fc. (18) 

t=\ \ t=l / 

Now, the doping process Dh is a renewal process, and thus, 
the Wald Equality 111 61 implies that the mean stopping time is 

E {D\ = k/E [Y] . 

VI. Comparative Cost Analysis 

The summary of the proposed approach to dissemination, 
storage, and collection with doping, based on IS combining 
for storage, and a random degree-two doping for collection 
strategy, is given in Figure [S] We here analyze the perfor- 
mance of this approach in terms of data collection cost. 
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The cost of the upfront collection from the nearby nodes in 
the super squad 1 + (s — l)/4 is significantly smaller than 
the collection cost when the packets are polled from their 
original source relays, which is in average k/A. Nevertheless, 
in this section, we show that the number of doped packets 
kd will be sufficiently smaller than the residual number of 
undecoded symbols when the belief propagation process first 
stalls, so that their collection cost is offset, and the overall 
collection cost is reduced relative to the original strategy. We 
quantify the performance of the decoding process through the 
doping ratio fc^/fc. Figure |9] illustrates the dramatic overhead 
(/sT {kd) — k) /k reduction when employing doping with an IS 
distribution relative to the overhead of RS encoding without 
doping. Figure [To] demonstrates that RS with doping performs 
markedly worse than IS encoding. In particular, it illustrates 
that IS with doping demonstrates a very low variance, which 
is surprisingly different from the results without doping. 

In Section |ll] we characterized a circular squad network 
by its node density jj, and its source density fig so that 
network scaling can be expressed through the scaling of the 
coverage redundancy factor h = jJ-/ l^-s- Figure [TTI illustrates 
the importance of considering coverage redundancy when 
selecting storage/ collection strategy: in the case of degree-two 
dissemination, when the size of the supersquad s increases the 
fountain code strategy improves (in terms of a reduced doping 
kd/k required for decoding) due to an increase in mixing. 
As a result of the mechanism described in Figures |4] and |5] 
the degree-two combinations in adjacent squads have a sig- 
nificant number of common packets. Hence, when forming 
code symbols by combining degree-two packets, we encounter 
a higher code symbol dependency and an increased number 
of redundant symbols in the ripple, which increases the 
probability of uncovered input symbols. Our doping overhead 
accounts for uncovered symbols, since ultimately they need 
to be pulled off the original sources for the complete data 
recovery. With increased supersquad size s (or, equivalently, a 
decreased h) the mixing of input symbols is improved and this 
negative effect is alleviated. This dependency is not present 
in the case of degree-one dissemination. Figure [12] gives the 
corresponding required doping kd/k as a function k for a fixed 
squad size h = 200. 

The cost minimization problem for any encoding scheme 
with (and without) doping is described as follows. Let, the 
pair {ks,kd) be the feasible number of encoded and doped 
packets when sufficient for decoding the original k packets. 
The per-source packet collection cost for this pair is 

CT{h)^[cs{h)k, + Cdkd]/k (19) 

where Cs{h) = 1 + (s(/i) + l)/4 is the average collection cost 
from the supersquad of size s{h) = [fcs//i] and Cd = [^74] is 
the average collection doping cost when polling doped packets 
from the original source relays. Examples of {kg, kd) pairs are 
(0, k) for the pure polling mechanism with cost ct(/i) = = 
[fc/4] and {ks = k+^/{k) log^(fc/(5), 0) in average for degree- 
one dissemination and RS fountain encoding with average 
per-packet cost ct ~ Cs{h)kg/k. For any given encoding 
mechanism and the set of feasible pairs {ks,kd), the minimum 
per-packet collection cost is Cmin{h) = minjj,^ jj^) cxih). The 



effect on the doping percentage of increasing the number 
of upfront collected symbols kg above k (described by our 
general model of interdoping times) is illustrated in Fig- 
ure [T3] Figure [14] illustrates per-packet collection cost above 
minimum, based on (T% . as a function of the number of 
packets {kg — k)/k collected from the supersquad in excess 
of k, for different values of coverage redundancy /i, and IS 
encoding. For the range of coverage redundancies that may be 
of practical value (up to 50), the minimum collection cost 
is obtained for ks,min/k G (1,1.05). Figure [TSl illustrates 
the per-packet cost CT{h)/{k/A) normalized to the reference 
polling cost as a function of As/A = l//i, the relative 
density of source nodes for a network with k = 2000 source 
packets. Four strategies are included all based on degree-one 
packet dissemination: reference polling, degree-one coupon 
collection, RS with no doping, and the IS encoding with a 
feasible doping pair {ks, kd). Note that the proposed scheme 
is inferior to the RS-based scheme only for very low density 
of events, i.e. when h > 1000. 

In conclusion, in this paper we showed that, for the circular 
squad network, the total collection cost could be reduced 
by applying a packet combining degree distribution that is 
congruous to doping, applying a good doping mechanism, and 
by balancing the cost of upfront collection and doping, given 
coverage redundancy factor The proposed network model that 
includes a route of relays and the nodes overhearing relays' 
transmissions is chosen based on a range of sensor network 
applications that monitor physical phenomena with linear 
spatial blueprint, such as road networks and border-security 
sensor nets. In order to limit the scope of the paper, we here 
omit describing a more general setup in which our network 
model can be used. However, we argue that networks of 
different (non-linear) topology may use dissemination mech- 
anisms that produce shortest routing paths from data sources 
to the collection node, suggesting a cost collection analysis 
based on these "linear route networks" and, hence, similar to 
the one presented here. This is one of the reasons we treat 
data dissemination separately from data collecting in our cost 
analysis. 
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Appendix 

Random Walk Ripple Evolution: The Stopping Time Probabil- 
ity 

Recall that for the IVIarkov Chain model of the ripple evolu- 
tion, described by ^ and (fTOl i. its trapping state corresponds to 
the empty ripple. The probability of entering the trapping state 
at time t, where t > Ti, is given in (fTZt . where u ~ t—Ti. The 
probability of being in the trapping state at Ti + u can also be 



expressed as p^^^^ = [0 1 • • • 0] P|" 
Hence, we can reformulate (fTZt as 



[1 77(0) 0---0] 



p'^^iu) = [0 1 • • • 0] P"^-^ [0 i]{0) • • • 0]"^ . (20) 

Note that both [0 1 • • • 0] and [0 7](0) • • • 0]"^ have zero- 
valued first elements, which means that the first row and the 
first column of the transition probability matrix do not 
contribute to the value of (|20] i. Hence, we introduce a new 
matrix which contains the significant elements of P^ as 



7?(1) 77(2) 7/(3) 

77(0) 77(1) 77(2) 

77(0) 7^(1) 



(21) 



fc-lxfc-l 



whith 77 (•) E 



_ ••• 77(0) 77(1)_ 
p (a^ ^) . Now, 

= 77(0) [0 1 • • • 0] p("-i) [1 • • • 0]'^ . (22) 



Assuming 71 is large, we can approximately express the 
uth power of the matrix P^ through a matrix that contains 
elements H(")() of the Tith convolution of the pdf array 77 = 
[7^(0) 77(1) • • •] . Let us define rj as degree-one convolution. 
For order-two convolution, we convolve 77 with itself, and 



uth convolution of r] is obtained by recursively convolving 
{u — l)th convolution with 77. By multiplying the matrix 

[77(0) 77(1) 77(2) ••• 
77(0) 77(1) ••• 







77(0) 77(1) 



which was obtained by adding the column [7;(0) 
front of P;, and another matrix 

^77(2) 77(3) 77(4) • . - 
77(1) 77(2) 77(3) 

77(0) 7^(1) 77(2) 

m 



(23) 



m 



(24) 



which was obtained by adding the row [7;(2) 77(3) 77(4) 
above P, , we obtain 



pC" pR 



■H(2)(2) 
H(2)(i) 
H(2)(0) 



H(2)(3) 
H(2)(2) 
H(2)(2) 



(25) 



(26) 



where H(^^((i) is the s-th convolution of 77(-) evaluated at d, 
and D*^^' is what we refer to as second convolution matrix of 
77, for 77 (•) = p (^-^u^^ ■ Hence, 



P^ = D(^)-77(0) 



■77(2) 7,(3) 




(27) 



= D(2) - [7^(0) O---]^ [h(i)(2) H(i)(3) H(i)(4)--- 
By induction, 

p3 = d(3) _ [,y(0) O---]^ [h(2)(3) h(2)(4)...1 (28) 
- P,[?7(0) 0---]^[h(i)(2) H(i)(3)-- 

U 

pu ^ D(")-^sr(^) 

z=2 

SUz) = p1"""' [77(0) O---]^ [h(^-i)(z) H(^-l)(z + l)••• 
Replacing ([27j in ( I22I 1. we obtain ( fT3l l. 
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ig. 2. Close-Up of a Circular Squad Network of k relays. We assume there 
s one source per relay on average - the source entrusts its data packet to the 
'closest relay, hence making it a virtual source. Each relay is overheard by 
nodes in its transmission range, referred to as squad nodes. 





Fig. 1. Collection of coded symbols: pull phase brings the three squads of 
coded packets to the decoder, and then, whenever the decoder gets stalled, 
an original symbol is pulled off the network for doping. We here deliberately 
omit to show that the squad nodes are overhearing (belonging to) two adjacent 
relays in order to highlight the two-phase collection, as opposed to the storage 
protocol. 




k 1 2 

Fig. 3. Circular Squad Network: the storage graph. 
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Fig. 4. Dissemination procedure brings all network data to each relay in 
half as many hops as it would be needed with simple forwarding scheme: 
example for k = 7 follows the exchanges of node 1 where the black circle 
on the bottom represents the node's receiver while each gray circle above it 
represents the transmitter at the conesponding dissemination round. 
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Initialization: 

k=l: Relay i sends its own packet pi, and subsequently 

receives the packets P(i-i) and P(i+i) originating from 
its first-hop neighbors. 

k=2: Relay i sends a linear combination (XOR) of the received 
packets P(i-i) and P(i+i), and subsequently receives the 
packets containing pi XOR-ed with the packets P(i-2) 
and P(i+2) originating from its second-hop neighbors, 
respectively. Relay i recovers P{i-2) and P(i+2) by 
XOR-in the received linear combinations with Pi. 
For (fc = 3, < (n + l)/2, k + +) 

Online Decoding 

The packets received by relay i in the (k — l)th round 
contain linear combination of packets P(i_fc+2) and 
P(i+fc-2) and packets P(i-k+i) and P(i+k-i), originating 
from its (fc — l)th hop neighbors. XOR-ing the received 



packets with the matching packets p(i_ 



k + 2) 

the relays recover the packets p(i_ 



and 

fe+i) 



and 



P(i+k- 

P(i+fe-l)- 

Storing 

The buffer space is updated with the recovered original 



packets pj^ 



-fe+i) 



and 



P(i+fc-l) 



For > 3 the buffer 



space is updated by overwriting packets p^i-k+i) and 

P(i+fc-4)- 

Encoding 

In the fcth round, relay i linearly combines packets 



P(i-fc+i) and P(i+k- 
combination. 



1) 



and transmits the linear 



Modified IS after 1-500 decodings w/o deg-0 and deg-1 (decoded and the ripple) 
10° 




5 10 

Ideal Soliton for k=500 input symbols (k=1000-l) 




Fig. 7. Density Evolution of IS distribution due to unifomi doping. The upper 
graph is the distribution of the output symbols after £ = 500 decodings, 
for initial number of collected code symbols k = 1000; the lower graph 
is the IS with support set {1, ■ • ■ , (fe = 1000 — £)} as if we are starting 
with the matrix of the same size (initial number of collected code symbols 
k = 1000 — i) as the matrix doped in round £. 



Fig. 5. Degree-two Dissemination Algorithm 
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Fig. 6. In the graph Gt, representing the stalled decoding process at time t, 
we identify nodes on the left side (input symbols corresponding to rows of the 
incidence matrix) connected to right-hand-side nodes of degree two (output 
nodes corresponding to columns of weight two, represented by black nodes, 
and pointed to by black arrows), and then uniformly at random select one 
such input symbol to unlock the decoder. The set of symbols we are selecting 
from is represented by red nodes, indicated by red arrows. 



Dissemination and Storage: 

degree-one/two dissemination of k source packets; each 
storage node stores a random linear combination of d 
disseminated packets; d is drawn from IS p{d). 
Upfront collection: 

IDC collects ks encoded packets from s closest storage 
squads. 

Belief propagation decoding and doping-collection: 

I = 0: number of processed source packets 
fcr,;: number of packets in the ripple 
kd = 0: number of doped packets. 
For (/ = 0, ; < fc, / + +) 
while krj — 

Collect(from the source relay) and dope the decoder with 
a source packet contributing to a randomly selected 
degree-two (or larger) output packet. 
kd + +;l + +; 
endwhile 

Process a symbol from the ripple; fc^,; ; 

endfor 

Fig. 8. Proposed dissemination, storage, and doping collection 
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Overhead for IS (Ideal Solilon with degree-2 doping} and L (LT simulation} 

60 1 ^ ^ =1 




s; number of squads in the supersquad of size 1000 

Fig. 9. Overhead (doping) percentage: we define kxikd) = ks + kd as 

tlie number of symbols collected in both collection phases, and the collection pjg n Doping percentage as a function of supersquad size when code 

overhead ratio as (fcT(fcd) - k)/k, which alows us to compare the overhead symbols are linear combinations of degree-two packets: for a fixed number 

for the simulated LT decoding of k original symbols and the simulated of upfront collected symbols ks = 1000, encoded by degree-two IS method, 

degree-two doped belief-propagation decoding of k coded symbols with the squad size (node density) is changed, so that the supersquad contains 

IS degree distribution. The LT overhead bound is the analytical bound by 1,2,5, and 10 squads. The more squads there are, the more intense is the 

Luby (M]. The IS doping bound is the analytical bound based on the algorithm jaja mixing, decreasing the probability of non-covered original symbols, 
given in Figure 1161 



Ideal Soliton with degree-two input symbols 

22 1 1 1 1 1 1 I - 




§00 1000 1500 2000 2500 3000 

n: number of symbols to decode 

Fig. 12. The encoding process emulates supersquads with fixed squad size 
h = 200 and the degree-two input symbols overheard within the superquad: 

Fig. 10. Doping percentage with initial IS code symbol degree distribution the resulting doping percentage for IS degree distribution of stored code 

vs RS. Both mean and variance are much smaller for Ideal Soliton. symbols. 
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Fig. 13. Doping percentage for different values of <5 = kg/k — l. Emulation 
results are obtained based on our analytical model and algorithm in Figure [16] 



Fig. 15. Collection Delay for various collection techniques, normalized with 
respect to the polUng cost, as a function of 1/h. Note that the proposed 
doping strategy is inferior to polling only when there are no other nodes 
but relays. For very large squads (> 1000), the proposed doped IS code 
induces a sufficiently large polhng cost (usually to start the process as the IS 
sample is likely not to have degree-one symbols) which offsets (and exceeds) 
the cost due to overhead packets solicited from the supersquad with the RS- 
based strategy without doping. The coupon collection (non-coding) strategy 
is consistently worse by an order of magnitude than the RS-based fountain 
encoding and is worse than polling for high source densities (small squads 
with tens of nodes). 




Fig. 14. Collection delay (hop count) above minimum per input symbol for 
different values of coverage redundancy h as a function of 5. Note that there 
is an optimal <5 for each h in which the delay is minimized: for h = 10 <5 is 
one percent, for h = 15 it is 3% percent, for /i = 30 5 = 4% 



i} for t < fc - U 



Initialization: 

li=Q,D = Q 
For (i = 1,D < A:, j + +) 

Calculate A'*' {U) 

Using l ll3t , calculate Prlli 

Using ([T6j, calculate E [Y-] 

D = D + E[Y,] 

h=D 
kd = i,Pd = WOkd/k 



Fig. 16. Calculation of the expected doping percentage pd based on the 
number of upfront collected symbols 
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Abstract 

This paper studies decentralized. Fountain and network-coding based strategies for facilitating data 
collection in circular wireless sensor networks, which rely on the stochastic diversity of data storage. The 
goal is to allow for a reduced delay collection by a data collector who accesses the network at a random 
position and random time. Data dissemination is performed by a set of relays which form a circular route 
to exchange source packets. The storage nodes within the transmission range of the route's relays linearly 
combine and store overheard relay transmissions using random decentralized strategies. An intelligent data 
collector first collects a minimum set of coded packets from a subset of storage nodes in its proximity, 
which might be sufficient for recovering the original packets and, by using a message-passing decoder, 
attempts recovering all original source packets from this set. Whenever the decoder stalls, the source packet 
which restarts decoding is polled/doped from its original source node. The random-walk-based analysis of 
the decoding/doping process furnishes the collection delay analysis with a prediction on the number of 
required doped packets. The number of doped packets can be surprisingly small when employed with an 
Ideal Soliton code degree distribution and, hence, the doping strategy may have the least collection delay 
when the density of source nodes is sufficiently large. 
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I. Introduction 



Wireless sensor networks (WSN) monitor and collect sensor data distributed over large physical 
areas. Sensor nodes are simple, battery-run devices with limited data processing, storage, and 
transmission capabilities. For energy efficiency reasons, the main data propagation model is hop- 
by-hop, where nodes relay other nodes' data. An important collection scenario is when a data sink 
(a collector) appears at a random position, at random time, and aims to collect all the k source 
data packets. The network's goal is to ensure that the data packets be efficiently disseminated and 
stored in a manner which allows for a low collection delay upon collector's arrival. This is achieved 
by storing data in a compact collection area at the fingertips of the data collector, i.e., at a set of 
connected (through multiple hops) wireless nodes in its proximity. There is a fundamental tradeoff 
between network storage capacity and the collection delay. If each node across the network can 
store all the packets, the data can be collected in a single hop. On the other extreme, if each node 
has own data destined for the collector and no capacity to store other packets, the collector has to 
reach out to all the network nodes to collect the data, which would incur an extreme delay. The 
canonical model considered here is when the number of source nodes k is smaller than the number 
of network nodes and where each network node can both relay and store one packet. 

Storing data at the fingertips of the randomly positioned collector implies redundant data 
storage across the network, whether it means simply storing source packet replicas, or random 
linear combinations thereof, and resulting respectively in a repetition code, or other linear code, 
implemented across a network. More efficient storage codes than the simplest repetition code require 
that the collector not only collects the linear combinations but also is capable of decoding/recovering 
the original packets. The two general classes of packet combining (coding) techniques in [?], [?], 
[?], [?], and [?] are: Fountain-type erasure codes [?], [?], [?], [?], and decentralized erasure codes 
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[?] - a variant of random linear network codes [?], [?]. The advantage of Fountain type coding is 
in the linear complexity of decoding which, here, corresponds to linear original packet recovery 
time. The key difficulty of the Fountain type storage approach is in devising efficient techniques 
to disseminate data from multiple sources to network storage nodes in a manner which ensures 
that the required statistics of created linear combinations is accomplished. Achieving this goal is 
particularly difficult when employed with the classic random geographic graph network models [?], 
[?]. 

In this paper we analyze decentralized Fountain-type network coding strategies for facilitating 
a reduced delay data collection and network coding schemes for efficient data dissemination for a 
planar donut-shaped sensor network (see Fig. [T]) whose nodes lie between two concentric circles. 
The network backbone is a circular route of relay nodes which disseminate data. All network nodes 
within its transmission range overhear relay's transmissions and serve as potential storage nodes. 
The storage nodes within a relay's transmission range form a squad. The squad size determines the 
relay's one-hop storage capacity. Squad's storage capacity together with the source node density and 
the coding/collection strategy determine the data collection delay measured in terms of the number 
of communication hops required for the collector to collect and recover all k source data packets. 
In the proposed polling (packet doping) scheme, an intelligent data collector (IDC) first collects a 
minimum set of coded packets from a subset of storage squads in its proximity (as in Fig. [B, which 
might be sufficient for recovering the original packets and, by using a message-passing decoder, 
attempts recovering all original source packets from this set. Whenever the decoder stalls, the source 
packet which restarts decoding is polled/doped from its original source node (at an increased delay 
since this packet is likely not to be close to the collector). The random-walk-based analysis of the 
decoding/doping process represents the key contribution of this paper. It furnishes the collection 



3 

delay analysis with a prediction on the number of required doped packets. The number of required 
packet dopings is surprisingly small and, hence, to reduce the number of collection hops required 
to recover the source data, one should employ the doping collection scheme. The delay gain due to 
doping is more significant when the relay squad storage capacity is smaller. Furthermore, employing 
network coding makes dissemination more efficient at the expense of a larger collection delay. 

Not surprisingly, a circular network allows for a significantly more (analytically) tractable strategy 
relative to a network whose model is a random geometric graph (RGG) [?], [?]. Besides, the 
RGG modeling implies that a packet is forwarded to one of the neighbors in the network graph, 
while the fact that all the neighbors are overhearing the same transmission is not considered. In 
contrast, the proposed approach is aiming to incorporate the wireless multicast advantage [?] into the 
dissemination/storage model. In our earlier work [?], we show how a randomly deployed network 
can self-organize into concentric donut-shaped networks. Note that the proposed topology is an 
especially good model for sensor networks deployed to monitor physical phenomena with linear 
spatial blueprint, such as road networks, vehicular networks, or border and pipeline- security sensor 
nets [?], [?]. 

II. System Model and Problem Formulation 

We consider an inaccessible static wireless sensor network (e.g., a disaster recovery network) 
with network nodes that are capable of sensing, relaying, and storing data. Nodes are randomly 
scattered in a plane according to a Poisson point process of some intensity fi. The nodes have 
constrained memory resources. Without loss of generality, we assume that most nodes have a unit- 
size buffer. Each node that senses an event creates a unit-size description data packet. We refer to 
such a node as a data source. We assume that events are distributed as a Poisson point process of 
intensity Hs < A^- We define the transmission range as the maximum distance r from the transmitter 
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at which nodes can reliably receive a packet. Assuming radially symmetric attenuation (isotropic 
propagation), the transmitted packet is reliably received in a disk of area r^Tr, illustrated in Figure[2l 
The expected number of network nodes in the disk is /ir^Tr and the expected number of source 
nodes is Ust'^t^- 

Within the sensor network, we consider a circular route, composed of k nodes referred to as 
relays. The distance between adjacent relays is equal to the transmission range r. Without a loss of 
generality and for simplicity, we assume that r is selected so that only a single data source node is 
(expected to be) within the transmission range of a relay. That is, /isr^Tr = 1. Source node observes 
an event and sends its data packet to the relay, making it a virtual source (see Figure |2l). Thus, k 
relays form a linear network (route) with data packet i assigned to relay z, i E [1, . . . ,k]. Each 
sensor node within the range of a relay is associated with the route via a one-hop connection to a 
relay. We refer to the set of nodes within the range of a relay as a squad, and to a node as squad- 
node. Squad nodes can hear transmissions either from only relay i or from, also relay z + 1 and, 
thus, belong to either the own set of squad-nodes Oi, or to the shared set of squad-nodes, denoted 
S'j(j+i), where, hereafter, any addition operation will be assumed to be mod k, i.e., (i + l) mod k, as 
shown in Figure [H By means of associations, the relays in the circular route together with the squad 
nodes form a donut-shaped circular squad network. The expected number of nodes in the squad 
is denoted with h = /ir^Tr, while the expected area of each shared set is E [^^(j+i)] = h = OAh. 
We primarily focus on shared squad nodes. In the rest of the paper, whenever we refer to squad- 
nodes, we mean shared nodes, and for simplicity we assume h = h. The goal is to disseminate 
data from all sources and store them at squad nodes so that a collector can recover all k original 
packets with minimum delay. An IDC collects data via a collection relay. The data is collected 
from kxikd) = kg + ka storage nodes of which most {kg » kd) reside in a set of s adjacent 
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squads, including the collection relay squad. These s squads form a supersquad (See Figure [T]). 
The number of packets not collected from the supersquad is denoted kd- 

Note that the density of sources /i^ is dependent on the spatial characteristics of the monitored 
physical process, i.e., the spatial density of events. A well designed sensor network will ensure 
that the spatial density of nodes fi is designed to properly cover this process. When r is selected 
to ensure r^vr/Xs = 1 then h = ^/ is the coverage redundancy factor. Furthermore, for a given 
received signal-to-noise ratio, the one-hop transmission energy Ei and the single hop delay ri are 
inversely proportional to jig- And, for a given circular route radius R, the (expected) number of 
relays is k = R/r. Hence, for a given transmission range r (or /i^), the only degree of freedom 
is the coverage redundancy factor h (squad size), i.e, the network density fi. By reducing fi, we 
decrease the average number of nodes in a squad h. This has implications to the collection (delay 
and energy) cost. The supersquad consists of s = \ks/h~\ squads, and the average number of hops 
a packet makes until it is collected by the IDC is (s — 1)/4 + 1. Hence, the smaller the /i, the larger 
the average collection delay = ks{{s — l)/4 + l)ri and the energy Eg = ks{{s — l)/4 + 1) i^i 
from the supersquad. Henceforth, we will, without loss of generality, normalize ri = 1 and Ei = 1. 
The key collection performance measure will be the average number of collection hops per source 
packet c, where c = ks[l + {s — l)/4] //c when all collected packets are from the supersquad, i.e., 
kriO) = kg. 

We will comparatively consider two classes of storage/encoding strategies: in the first, the IDC 
collects the original packets, while in the second one the collected packets are linear combinations 
of the original packets and, hence, the IDC needs to decode them to recover source packets. When 
combining is employed, constrained by the collection delay, we consider only storage strategies 
which allow for decoding methods of linear complexity, i.e., the use of belief propagation (BP) 
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iterative decoders. Taking as a reference the case where original packets are encoded into coded 
packets whose linear combination degrees follow the Robust Soliton distribution, as in [?], based on 
the asymptotic analysis of LT codes [?], we expect that kxiO) = kg = k + \/k\o^{k/e) collected 
code symbols are required to decode (1 — e)k original symbols, where e is a sufficiently small 
constant. Here the number of collected packets is significantly larger than k for small to medium 
number of sources k. Hence, collection of this many packets can be expensive, in particular when 
the event coverage redundancy factor h is small. Collecting a smaller number of packets upfront 
would result in a stalled decoding process. Here, we take advantage of the availability of additional 
replicas of source packets along the circular network, to pull one such packet off the network in 
order to continue the stalled decoding process. See Figure [T] The pull phase is meant to assist the 
decoding process using a technique that we refer to as doping. In the following, encoding describes 
the mapping on the source packets employed both while disseminating and while storing. It is a 
mapping from the original k packets to the collected kxikd) = kg + kd encoded packets. 

III. Data Dissemination 

The nodes within the transmission range of the route relays together with the relays themselves 
form a dissemination network. The dissemination connectivity graph is a simple circular graph with 
k nodes. This graph models connections between relays, which are bidirectional. The connectivity 
graph used in the storage model is expanded with storage nodes, representing shared squad nodes. 
In this graph, every storage node is adjacent to two neighboring relay nodes. Also, edges between 
storage and relay nodes are directed, as illustrated in Figure |3l Every edge in the dissemination 
graph is of unit capacity. A single transmission reaches two neighboring relays. 

We consider two dissemination methods: no combining in which each relay sends its own packet 
and forwards each received packet until it has seen all k network packets, and degree-two combining. 
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described in Figure [51 For the degree-two combining dissemination, a relay node combines the 
packet received from its left with the packet received from its right into a single packet by XOR- 
ing respective bits, to provide innovative information to both neighboring relays for the cost of 
one transmission [?]. Consequently, each relay performs a total of \{k — l)/2] first-hop exchanges, 
as described in Figure |4] and in [?]. Note that here storage nodes overhear degree-two packet 
transmissions. They either randomly combine those with previously received degree-two packets, 
or they first apply the on-line decoding of the packets (see Figure S]), and then combine obtained 
degree-one packets with previously stored linear combinations of degree-one packets. For further 
details about degree-two dissemination, the reader is referred to [?]. 

IV. Decentralized Squad-Based Storage Encoding 

Under a centralized storage mechanism that would allow coordination between squad nodes, a 
unique packet could be assigned to each of k nodes located within a supersquad of an approximate 
size k/h, and the same procedure repeated around the circular network for each set of k adjacent 
squad nodes. This periodic encoding procedure would allow a randomly positioned IDC to collect 
k original packets from the set of closest nodes. However, our focus are scalable designs where 
centralized solutions are not possible. We resort to stochastic protocols for storing packet replicas, 
and apply random coding to store linear combinations of the packets. For each dissemination method 
we distinguish: combining and non-combining decentralized storage techniques. In both we assume 
that the storage squad nodes can hear (receive) any of the k dissemination transmissions from the 
neighboring relay nodes. Hence, either a common timing clock or/and regular transmission listening 
is necessary. The reference example of non-combining (non-coding) methods is coupon collection 
storage, in which each squad node randomly selects one of k packets to store ahead of time. As the 
coupon collector is completely random, it requires on average k log k storage nodes to cover all the 
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original packets. In order to decrease the probability of many packets not being covered, we apply 
combining storage techniques in which one storage node's encoded packet contains information that 
covers many original packets. The higher this code symbol degree is, the lower is the likelihood that 
a packet will stay uncovered. We consider combining either degree-two or degree-one packets. Each 
squad node samples a desired code symbol degree d from distribution uj{d), d E [1, ■ " " ,k]- The 
squad node decides ahead of time which subset of d transmissions it will combine to generate the 
stored encoded packet. Choosing a good distribution u{d) is not easy, since it needs to satisfy many 
contradicting requirements. The high-degree code symbols are good for decreasing the probability 
of uncovered packets. However, other requirements are more important for proper behavior of the 
BP decoding process, especially the right amount of degree one and degree two code symbols. It is 
well known that Ideal Soliton's (IS) expected behavior is close to ideal for Fountain codes decoded 
by a BP decoder, but the large variance may cause a frequent absence of degree-one symbols (the 
ripple) in the collected sample of code symbols, thus stalling the BP process. This is the reason 
why Robust Soliton (RS) is used as a choice degree distribution for rateless erasure codes. For 
RS, the probability of one-degree symbols is overdesigned in order to prevent stalling. However, 
redistribution of the probability mass from higher degrees to degree-one increases the likelihood 
of uncovered packets. In the next section, we present an analysis of why IS turns out to be better 
than RS when BP doping is used. 

V. Collection and Decoding 

The collection problem with the coupon collector (and with similar non-combining storage 
methods) is straightforward as it excludes decoding. The focus is simply on providing coverage 
redundancy h that minimizes the size of supersquad containing k log k packets required to recover 
k source packets. For the Fountain-based combining methods, the collection problem is more 
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elaborate, and intricately tied to decoding strategy, which we study in the following subsections. 

A. Belief Propagation Decoding 

Suppose that we have a set of ks code symbols that are linear combinations of k unique input 
symbols, indexed by the set {1, ■ ■ ■ , /c}. Let the degrees of linear combinations be random numbers 
that follow distribution ijj{d) with support G {1, ■ ■ ■ , k}. Here, we equivalently use uj{d) and its 
generating polynomial Vl{x) = Yld=i^dX'^^ where fid = u:{d). Let us denote the graph describing 
the (BP) decoding process at time t by Gt (see Figure [6l). We start with a decoding matrix So = 
l^v]kxk ' where code symbols are described using columns, so that Sij = 1 iff the jth code symbol 
contains the ith input symbol. Number of ones in the column corresponds to the degree of the 
associated code symbol. Input symbols covered by the code symbols with degree one constitute the 
ripple. In the first step of the decoding process, one input symbol in the ripple is processed by being 
removed from all neighboring code symbols in the associated graph Gq. If the index of the input 
symbol is m, this effectively removes the mth row of the matrix, thus creating the new decoding 
matrix Si = [sij](^k-i)xks ' ^^^^^ '"^'^^ symbols modified by the removal of the processed 
input symbol as output symbols. Output symbols of degree one may cover additional input symbols 
and thus modify the ripple. Hence, the distribution of output symbol degrees changes to ^li(x). 
At each subsequent step of the decoding process one input symbol in the ripple is processed by 
being removed from all neighboring output symbols and all such output symbols that subsequently 
have exactly one remaining neighbor are released to cover that neighbor. Consequently, the support 
of the output symbol degrees after i input symbols have been processed is d E {1, ■ ■ ■ ,k — £} , 
and the resulting output degree distribution is denoted by Vle{x). Our analysis of the presented BP 
decoding process is based on the assumption that the ripple size relative to the number of higher 
degree symbols is small enough throughout the process. Consequently, we can ignore the presence 
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of defected ripple symbols (redundant degree-one symbols) [?]. Hence, the number of decoded 
symbols is increased by one with each processed ripple symbol. Now, let us assume that input 
symbols to be processed are not taken from the ripple, but instead provided to the decoder as side 
information. We refer to this mechanism of processing input symbols obtained as side information 
as doping. In particular, to unlock the belief propagation process stalled at time (iteration) t, the 
degree-two doping strategy selects the doping symbol from the set of input symbols connected to 
the degree-two output symbols in graph Gt, as illustrated in Figure |6l Hence, the ripple evolution 
is affected in a different manner, i.e. with doping-enhanced decoding process the ripple size does 
not necessarily decrease by one with each processed input symbol. 

The following subsections study the behavior of both varieties of the BP decoding process, 
first through the evolution of symbol degrees higher than one, and in particular by demonstrating 
the ergodicity of the Ideal Soliton degree distribution, then by modeling and analyzing the ripple 
process, resulting in an unified model for both classical and doping-enhanced decoding. Based on 
that model, we analyze the collection cost of the presented decoding strategies, when the starting 
uj{d) is Ideal Soliton. 

B. Symbol Degree Evolution 

In this subsection, we focus on the evolution of symbol degrees higher than one (unreleased 
symbols), and then analyze ripple evolution separately in the next subsection. The analysis of the 
evolution of unreleased output symbols is the same for both classical BP decoding case (without 
doping), and the doped BP decoding. We now present the model of the doping (decoding) process 
through the column degree distribution at each decoding/doping round. We model the ith step 
of the decoding/doping process by selecting a row uniformly at random from the set of {k — i) 
rows in the current decoding matrix = [■Stj](fc_£)xfc^' ^^'^ removing it from the matrix. After i 
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rounds or, equivalently, when there are k — i rows in the decoding matrix, the number of ones in 
a column is denoted by Afc„^. The probability that the column is of degree d, when its length is 
k — £ — 1, £ G {1, ■ ■ ■ , /c — 3}, is described iteratively 

P{A,_,_, = d) = P{A,_, = d)(l 



k- 



+ P(Afc_, = ci+l)^ (1) 

for 2 < d < k — i, and P = k — £) = 0. Let the starting distribution of the column degrees 

(for the decoding matrix So = [sij];.^}.) be Ideal Soliton, denoted by p{d), 

p{d) = forrf = 2,--- (2) 

and p(l) = i. By construction, for I = 0, P {A^ = d) = p{d), which, together with ©, completely 
defines the dynamics of the doping process when the Fountain code is based on the Ideal Soliton. 
After rearanging and canceling appropriate terms, we obtain, for c? > 2, 



P{Ak-i = d) 



^p{d) d = 2,---,k-l, 

(3) 

d> k-i. 



We assume that kg ^ k as, by design, we desire to have the set of upfront collected symbols 
kg as small as the set of source symbols. The probability of degree-d symbols among unreleased 
symbols Uu^ = kg — i can be approximated with P^^k-e-d)ks ^ PiAk-e-d)k ^ y^^^^^^ ^^le probability 
distribution uje{d) of the unreleased output node degrees at any time i remains the Ideal Soliton 

toeid) = j^^P {Ak-i = d)= p{d) for d = 2,- ■■ ,k-i. (4) 
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C. Doped Ripple Evolution: Random Walk Model 

There exist comprehensive and thorough analytical models for the ripple evolution, characterizing 
the decoding of LT codes [?], [?]. However, their comprehensive nature results in difficult to evaluate 
complex models. For describing the dynamics of a doped decoder, we consider a simpler model, 
which attempts to capture the ripple evolution for the Ideal Soliton. Figure |7] and the code symbol 
degree evolution analysis illustrate how the Ideal Soliton distribution maintains its shape with 
decoding/doping. This fact, which results in a tractable ripple analysis and, more importantly, in an 
outstanding performance as illustrated in the last section, is our main motivator for selecting Ideal 
Soliton Fountain codes for our doping scheme. We study the number of symbols decoded between 
two dopings and, consequently, characterize the sequence of interdoping yields. The time at which 
the iih. doping occurs (or, equivalently, the decoding stalls for the ith time) is a random variable 
Tj, and so is the interdoping yield Fj = Tj — Ti_i. Our goal is to obtain the expected number 
of times the doping will occur by studying the ripple evolution. This goal is closely related to (a 
generalization of) the traditional studies of the fountain code decoding which attempt to determine 
the number of collected symbols kg required for the decoding to be achieved without a single 
doping iteration, i.e., when Ti > k. 

Let the number of upfront collected coded symbols be fc^ = (1 + 5) , where 5 is a small positive 
value. At time £ the total number of decoded and doped symbols is and the number of (unreleased) 
output symbols h n = kg — ^ = \l {k — t) . Here, = 1 + ^5 is an increasing function of ^. 
The unreleased output symbol degree distribution polynomial at time i is VLi{x) = ^ ^Id/x'^, where 
d = 2, - ■ ■ , k—i, and Vld,£ = uJi{d). In order to describe the ripple process evolution, in the following 
we first characterize the ripple increment when i corresponds to the decoding and, next, when it 
corresponds to a doping iteration. 
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Each decoding iteration processes a random symbol of degree-one from the ripple. Since the 
encoded symbols are constructed by independently combining random input symbols, we can 
assume that the input symbol covered by the degree-one symbol is selected uniformly at random 
from the set of undecoded symbols. Released output symbols are its coded symbol neighbors 
whose output degree is two. Releasing output symbols by processing a ripple symbol corresponds 
to performing, in average, 122 = nVL2/ independent Bernoulli experiments with probability of success 
P2 = 2/{k — (.). Hence, the number of released symbols at any decoding step (. is modeled by a 
discrete random variable A^^"* with Binomial distribution B {nVL2/,2/{k — i)) , which for large n 
can be approximated with a (truncated) Poisson distribution of intensity 2^2/ A^"^-* 

(5) 1 {^\{:r, \r /I „ \n2—r 



Pr Ar=r = (r)(p2r(l-p2r-^ (5) 



r! 



where we have first applied the Stirling approximation to the Binomial coefficient and, also, assumed 
that the probabilities in ([5]) can be neglected unless 722 is much larger than r. According to dH), 
the fraction of degree-two output symbols for Ideal Soliton based Fountain code is expected to be 
n2/n ^ il2,e = p(2) = 1/2, for any decoding iteration i. Hence, 



PrjAf =r}=r/(r) = -^ L.^^ , r = 0, ■ ■ ■ , | (6) 

or, equivalently, A^'^'' ~ p ('^^''0 ' ^^^^^ ^ ^'^ denotes Poisson distribution. For each decoding 
iteration, one symbol is taken from the ripple and A^^^ symbols are added, so that the increments 
of the ripple process can be described by random variables Xi = Af^ — 1 with the probability 
distribution ?7(r+l) (for Xi = r) characterized by the generating polynomial I{x) = Yld=o v{d)x'^~^ 
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and an expected value — 1 . Next we describe the ripple increment for the doping iteration, where 
a carefully selected input symbol is revealed at time Tj = ti when the ripple is empty (random 
degree-two doping). The number of degree-two output symbols at time Tj = U is n2 = p{2)n = n/2, 
where, n = A^^^ {k — ti) . Degree-two doping selects uniformly at random a row in the decoding 
matrix St; that has one or more non-zero elements in columns of degree two. This is equivalent to 
randomly selecting a column of degree two to be released, and restarting the ripple (i.e., same as 
decoding) with any of its two input symbols from the decoding matrix whose number of degree-two 
columns is now ^2 — 1 ^ n2, for large n2. Hence, the doping ripple increment can be described by 
unit increase in addition to an increase equivalent to the one obtained through decoding but without 
the ripple decrement of 1. That is, statistically, the doping ripple increment Xf^ is a random variable 
described by I^{x) = ^ r]{d)x'^~^^, corresponding to the shifted distribution r]{r — 1) for X^l = r. 

Now if, for the doping instant t = we define X^. ^ = Xl^__^ — 2, the ripple size for 

t E [tj-i, ti] can be described in a unified manner with St^i + 2 where 



is a random walk modeling the ripple evolution. Note that the ripple increments Xi axe not IID 
random variables, since the intensity of 'r]{d) changes with each iteration i. However, for analytical 
tractability, we study the interdoping time using the random walk model in (|7]), by assuming that 
A^"^^ changes from doping to doping, but remains constant within the interdoping interval. Under 
this assumption, the ripple size St^i + 2 is a partial sum of IID random variables Xj, of the expected 
value A|2i — 1- Note that, when 6 = 0, i.e. when kg = k, St,i is a zero mean random walk. In 
this special case, we treat the doping-enhanced BP process as (an approximate) renewal process, 
where the process starts all over after each doping. Modeling and analyzing this particular case is 



t 




(7) 



j=U-i 



15 

much easier, resulting in a closed-form expression for the expected number of dopings. We later 
refer to this case to provide some intuition. The expected interdoping yield is the expected time 
it takes for the ripple random walk St 4 + 2 to become zero. Using random walk terminology, we 
are interested in the statistics of the random-walk stopping time. The stopping time is the time at 
which the decoding process stalls, counting from the previous doping time, where the first decoding 
round starts with the 0th doping which occurs at To = 0. Hence, the i-th stopping time (doping) 
Ti is defined as 

Ti = min {min {U : St,i + 2 < 0} , A;} . (8) 

We study the Markov Chain model of the random walk St^i- Each possible value of the random 
walk represents a state of the Markov Chain (MC) described by the probability transition matrix 
Pj. State v,v E {1, ■ ■ ■ ,k} corresponds to the ripple of size v — 1. State 1 is the trapping state, 
with the (auto)transition probability Pj n = 1 and models the stopped random walk. Hence, based 
on we have the state transition probabilities 

P.,11 = 1 (9) 

Pi,viv+b) =77(1 + 6), 

V = 2, - ■ ■ ,k, b = -1, ■ ■ ■ , min ( [l] ,k - v), 
and Pi^vw = otherwise, resulting in a transition probability matrix of the following almost Toeplitz 
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form 











viO) r/(l) 7/(2) •■■ 
r/(0) r/(l) ■■■ 







7/(0) 7/(1) 



(10) 



kxk 



with 7/ (■) = p (-^1^'') • The start of the decoding process is modeled by the MC being in the initial 
state V = 3 (equivalent to the ripple of size two). Based on that, the probability of being in the 
trapping state, while at time t > Tj, is 



Pt 



in) 



[0 1 ■ ■ ■ 0] Pf "^'^ [1 ■ ■ ■ 0] 



(11) 



Hence, the probability of entering the trapping state at time t is 



p^' (u) 



PTi+u Pt,+u~1 

[0 1 ■ ■ ■ 0] (^P,'' - Pf [1 • ■ ■ 0]^ , 



(12) 



where u = t — Ti. {Tj} is a sequence of stopping-time random variables where index i identifies 
a doping round. = Tj — Tj_i, z > 1 is a stopping time interval of a random walk of (truncated) 
Poisson IID random variables of intensity A^^'^ = 1 + 5^— and can be evaluated using the 
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following recursive probability expression 



Pr{Yi = 0} 



Pr{F, = 1} = 



(13) 



Pr {Yi = t+1} = r]{0)Ri (t) l<t<k, 



R^t) = ^^'\t - 1) - ^Pr{Ki = t-i}K(^)(l + i) 



i=l 



obtained from (fT2l) after a series of matrix transformations. Here, r]{0) is Poisson pdf of intensity 
A^^^ evaluated at 0, and 'R^'^\d) is the s-tupple convolution of ?7(-) evaluated at d, resulting in 
a Poisson pdf of intensity sX^^__^ evaluated at d. The complete derivation of (fT3]) is given in the 
Appendix. Note that the intensity sA^^^ is, in general, a random variable and that the sequence 
of doping times Ti is a Markov chain. Hence, the number of decoded symbols after hth doping, a 
partial sum = XliLi of interdoping yields, is a Markov-modulated random walk. 

The expected number of dopings sufficient for complete decoding is the stopping time of the 
random walk Dh, where the stopping threshold is k — ul- Here, based on the coupon collection 
model, ul is the expected number of uncovered symbols (which, necessarily, have to be doped) 
when kg coded symbols are collected 




(14) 



The total number od dopings is the stopping time random variable D defined as 



D 



mm 



{h:Dh + ui > k} . 



(15) 



Our model can further be simplified by replacing Tj.i with /j = Y^lJi E [Yt\Tt. 



-1 — 



It] in the 
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intensity Ay.^_^ (fT3]) and thus allowing for a direct recursive computation in (fTSl) . Hence, 

E[Y,\T,.^ = h] ^ 5^tPr{F, = t} (16) 

t=i 

Furthermore, we can approximate ~ Ih+i = = h] and use an algorithm in 

Figure [T6] (based on (fT5l) ) to calculate expected number of dopings. 



In special case when 5 = 0, further simplifying assumptions lead to the approximation that all 
interdoping yields are described by a single random variable Y whose pdf is given by the following 
recursive expression, based on (fT3l) . 



Pr{y = t + 1}= (17) 



t-i 



r/(0) pW(t-l)-5^Pr{t-z}p«(l 



i=l 



where p^'^\d) denotes Poisson distribution of intensity s, evaluated at d, and t G [0, k — 1]. The 
range of t varies from doping to doping, i.e. if Tj_i = /j, then Yi would have support t E [l^, k — 1], 
and, hence, this single variable approximation is accurate for the case when both the ripple size is 
small and when li <^k. We now approximate the expected value of the interdoping yield Y as 



k / k \ 

E[Y] ^ ^tPr{y = t}- ll- 5^Pr{F = t}]k. 

t=i \ f=i / 



(18) 



Now, the doping process Dh is a renewal process, and thus, the Wald Equality [?] implies that the 
mean stopping time is E [D] = k/E [Y] . 
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VI. Comparative Cost Analysis 



The summary of the proposed approach to dissemination, storage, and collection with doping, 
based on IS combining for storage, and a random degree-two doping for collection strategy, is 
given in Figure [8l We here analyze the performance of this approach in terms of data collection 
cost. The cost of the upfront collection from the nearby nodes in the super squad 1 + (s — l)/4 is 
significantly smaller than the collection cost when the packets are polled from their original source 
relays, which is in average k/A. Nevertheless, in this section, we show that the number of doped 
packets kd will be sufficiently smaller than the residual number of undecoded symbols when the 
belief propagation process first stalls, so that their collection cost is offset, and the overall collection 
cost is reduced relative to the original strategy. We quantify the performance of the decoding process 
through the doping ratio k^/k. Figure [9] illustrates the dramatic overhead [kx {kj) — k) /k reduction 
when employing doping with an IS distribution relative to the overhead of RS encoding without 
doping. Figure [TOl demonstrates that RS with doping performs markedly worse than IS encoding. In 
particular, it illustrates that IS with doping demonstrates a very low variance, which is surprisingly 
different from the results without doping. 

In Section In] we characterized a circular squad network by its node density jj, and its source density 
Us so that network scaling can be expressed through the scaling of the coverage redundancy factor 
h = jJ./ jJig- Figure [TT] illustrates the importance of considering coverage redundancy when selecting 
storage/ collection strategy: in the case of degree-two dissemination, when the size of the supersquad 
s increases the fountain code strategy improves (in terms of a reduced doping k^/k required for 
decoding) due to an increase in mixing. As a result of the mechanism described in Figures H] and [51 
the degree-two combinations in adjacent squads have a significant number of common packets. 
Hence, when forming code symbols by combining degree-two packets, we encounter a higher code 
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symbol dependency and an increased number of redundant symbols in the ripple, which increases 
the probability of uncovered input symbols. Our doping overhead accounts for uncovered symbols, 
since ultimately they need to be pulled off the original sources for the complete data recovery. 
With increased supersquad size s (or, equivalently, a decreased h) the mixing of input symbols 
is improved and this negative effect is alleviated. This dependency is not present in the case of 
degree-one dissemination. Figure [T2l gives the corresponding required doping kd/k as a function k 
for a fixed squad size h = 200. 

The cost minimization problem for any encoding scheme with (and without) doping is described as 
follows. Let, the pair [kg, k^) be the feasible number of encoded and doped packets when sufficient 
for decoding the original k packets. The per-source packet collection cost for this pair is 

CT{h) = [c,{h)K + Cdkd]/k (19) 

where Cs{h) = 1 + + l)/4 is the average collection cost from the supersquad of size s{h) = 
\ks/h] and Cd = \k/A] is the average collection doping cost when polling doped packets from 
the original source relays. Examples of (fcg, kd) pairs are (0, k) for the pure polling mechanism 
with cost cxih) = Cd = \k/A] and {kg = k + ^/{k)\o^{k/5),Q) in average for degree-one 
dissemination and RS fountain encoding with average per-packet cost ct = Cs{h)ks/k. For any 
given encoding mechanism and the set of feasible pairs [kg, kd), the minimum per-packet collection 
cost is Cmin{h) = min(fc^ fc^) Ct (/i) • The effect on the doping percentage of increasing the number 
of upfront collected symbols kg above k (described by our general model of interdoping times) 
is illustrated in Figure \T3\ Figure [14] illustrates per-packet collection cost above minimum, based 
on (fT9l ), as a function of the number of packets {kg — k)/k collected from the supersquad in 
excess of /c, for different values of coverage redundancy and IS encoding. For the range of 
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coverage redundancies that may be of practical value (up to 50), the minimum collection cost is 
obtained for ks^min/k E (1, 1.05). Figure [T5] illustrates the per-packet cost CT{h)/{k/A) normalized 
to the reference polling cost as a function of As/A = the relative density of source nodes 
for a network with k = 2000 source packets. Four strategies are included all based on degree-one 
packet dissemination: reference polling, degree-one coupon collection, RS with no doping, and the 
IS encoding with a feasible doping pair {kg, kd). Note that the proposed scheme is inferior to the 
RS-based scheme only for very low density of events, i.e. when h > 1000. 

In conclusion, in this paper we showed that, for the circular squad network, the total collection 
cost could be reduced by applying a packet combining degree distribution that is congruous to 
doping, applying a good doping mechanism, and by balancing the cost of upfront collection and 
doping, given coverage redundancy factor. The proposed network model that includes a route of 
relays and the nodes overhearing relays' transmissions is chosen based on a range of sensor network 
applications that monitor physical phenomena with linear spatial blueprint, such as road networks 
and border- security sensor nets. In order to limit the scope of the paper, we here omit describing 
a more general setup in which our network model can be used. However, we argue that networks 
of different (non-linear) topology may use dissemination mechanisms that produce shortest routing 
paths from data sources to the collection node, suggesting a cost collection analysis based on these 
"linear route networks" and, hence, similar to the one presented here. This is one of the reasons 
we treat data dissemination separately from data collecting in our cost analysis. 

Appendix 

Random Walk Ripple Evolution: The Stopping Time Probability 

Recall that for the Markov Chain model of the ripple evolution, described by ^ and (flOl ). its 
trapping state corresponds to the empty ripple. The probability of entering the trapping state at time 
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t, where t > Tj, is given in (fT2l) . where u = t — Ti. The probability of being in the trapping state 
at Tj + M can also be expressed as = [0 1 ■ ■ ■ 0] P^^"~^^ [1 7^(0) ■ ■ ■ 0]"^ . Hence, we can 

reformulate ([72)) as 

p^^(u) = [0 1 0---0]P^~^[0 r/(0) 0---0]^. (20) 

Note that both [0 1 ■ ■ ■ 0] and [0 r/(0) ■ ■ ■ 0] have zero-valued first elements, which means 
that the first row and the first column of the transition probability matrix Pj do not contribute to 
the value of (|20l) . Hence, we introduce a new matrix Pj which contains the significant elements of 
Pi as 



r/(l) 


^(2) 


r/(3) ••• 









7/(0) 


r/(l) 


rj{2) ■■■ 












7/(0) 


rj{l) ■■■ 





) 


(21) 








■•• vio) 




k-lxk-1 





whith r]{-) = p (^A;^ j . Now, 

/'(m) = 7/(0) [0 1 0---0]Pf~^^ [1 0---0]^. (22) 

Assuming n is large, we can approximately express the uth power of the matrix Pj through a 
matrix that contains elements K(")() of the uth convolution of the pdf array rj = [//(O) //(I) ■ ■ ■ ] • 
Let us define rj as degree-one convolution. For order-two convolution, we convolve rj with itself, 
and nth convolution of fj is obtained by recursively convolving (u — l)th convolution with rj. By 
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multiplying the matrix 



(23) 



r/(0) r/(l) rj{2) •■■ 

7/(0) 7/(1) ••• 

■■• rjiO) 7/(1) 

which was obtained by adding the column [77(0) ■ ■ - f^ in front of P^, and another matrix 

7/(2) 7/(3) 7/(4) ■■ 

7/(1) 7/(2) 7/(3) ■■ 

7/(0) 7/(1) 7/(2) ■■■ , (24) 

7/(0) 



which was obtained by adding the row [77(2) 7/(3) 7/(4) ■ ■ ■] above Pj, we obtain 

K{2)(2) K(2)(3) ■■■ 

K(2)(l) K(2)(2) ■■• 

pC' pi? _ 

' i 

K(2)(0) K(2)(2) ■■■ 



(25) 



(26) 
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where ^('^^((i) is the s-th convolution of ?7(-) evaluated at d, and D^^^ is what we refer to as second 
convolution matrix of r/, for 77 (■) = p (^1^^) • Hence, 



By induction, 



D(2) - r/(0) 



7/(2) 7/(3) 





D(2) - [r^(0) O---]^ [kW(2) H(i)(3) kW(4)---] 



(27) 



D(3) - [7/(0) ■ ■ ■ ]^ [^(2) (3) (4) ■ ■ ■ ] 
P,[r/(0)0---r [K«(2) ^(^HS)---], 

DW-^sr(^) 

2=2 

pS""'^ [7/(0) ■ ■ ■ ]^ [^(^-1) (2) (2 + 1) 



(28) 



Replacing (|27]) in we obtain ([I3]). 
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Fig. 1. Collection of coded symbols: pull phase brings the three squads of coded packets to the decoder, and then, whenever the 
decoder gets stalled, an original symbol is pulled off the network for doping. We here deliberately omit to show that the squad 
nodes are overhearing (belonging to) two adjacent relays in order to highlight the two-phase collection, as opposed to the storage 
protocol. 
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02={h\h»>=} 




Fig. 2. Close-Up of a Circular Squad Network of k relays. We assume there is one source per relay on average - the source 
entrusts its data packet to the closest relay, hence making it a virtual source. Each relay is overheard by nodes in its transmission 
range, referred to as squad nodes. 




k 1 2 3 

Fig. 3. Circular Squad Network: the storage graph. 
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buffer content 

Fig. 4. Dissemination procedure brings all network data to each relay in half as many hops as it would be needed with simple 
forwarding scheme: example for k — 7 follows the exchanges of node 1 where the black circle on the bottom represents the node's 
receiver while each gray circle above it represents the transmitter at the corresponding dissemination round. 

Initialization: 

k=l: Relay i sends its own packet pi, and subsequently receives the packets P{i~i) and originating from 

its first-hop neighbors. 

k=2: Relay i sends a linear combination (XOR) of the received packets P{i-i) and and subsequently 

receives the packets containing pi XOR-ed with the packets P{i-2) and P(i+2) originating from its 
second-hop neighbors, respectively. Relay i recovers P[i-2) and P{i+2) by XOR-in the received linear 
combinations with pi. 
For {k = ?,,k< (n+l)/2,A: + +) 

Online Decoding 

The packets received by relay i in the {k — l)th round contain linear combination of packets p^i-k+2) and 
P{i+k-2) and packets P[i-k+i) and P(i+/c_i), originating from its (fc — l)th hop neighbors. XOR-ing the 
received packets with the matching packets p^i-k+2) and P(.i+k-2)^ the relays recover the packets p^i-k+i) 
and 
Storing 

The buffer space is updated with the recovered original packets P{i-k+i) and P{i+k-i)- For fc > 3 the 
buffer space is updated by overwriting packets P[i-k+A) and P(i+k-i)- 
Encoding 

In the fcth round, relay i linearly combines packets P(^i-k+i) and P[i+k-i)^ and transmits the linear 
combination. 

Fig. 5. Degree-two Dissemination Algorithm 




Fig. 6. In the graph Gt, representing the stalled decoding process at time t, we identify nodes on the left side (input symbols 
corresponding to rows of the incidence matrix) connected to right-hand-side nodes of degree two (output nodes corresponding to 
columns of weight two, represented by black nodes, and pointed to by black arrows), and then uniformly at random select one such 
input symbol to unlock the decoder. The set of symbols we are selecting from is represented by red nodes, indicated by red arrows. 
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Modified IS after 1=500 decodings w/o deg-0 and deg-1 (decoded and the ripple) 




5 10 15 20 25 30 

degree 

Ideal Soliton for k=500 input symhols (k=1000-l) 
10 f , , , , , ^ 
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Fig. 7. Density Evolution of IS distribution due to uniform doping. The upper graph is the distribution of the output symbols 
after £ — 500 decodings, for initial number of collected code symbols k — 1000; the lower graph is the IS with support set 
{1, • • • , (fc = 1000 — I)} as if we are starting with the matrix of the same size (initial number of collected code symbols k = 
1000 — £) as the matrix doped in round £. 



Dissemination and Storage: 

degree-one/two dissemination of k source packets; each storage node stores a random linear combination 

of d disseminated packets; d is drawn from IS p{d). 
Upfront collection: 

IDC collects kg encoded packets from s closest storage squads. 
Belief propagation decoding and doping-collection: 

I = 0: number of processed source packets 

kr.i: number of packets in the ripple 

kd = 0: number of doped packets. 
For {l = 0,l<k,l + +) 

while fcr,/ = 

Collect(from the source relay) and dope the decoder with a source packet contributing to a randomly 
selected degree-two (or larger) output packet. 
kd + + 
endwhile 

Process a symbol from the ripple; kr^i ; 

endfor 



Fig. 8. Proposed dissemination, storage, and doping collection 
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Fig. 9. Overhead (doping) percentage: we define krikd) = fcs + fcd as the number of symbols collected in both collection phases, 
and the collection overhead ratio as (krikd) — k)/k, which alows us to compare the overhead for the simulated LT decoding of k 
original symbols and the simulated degree-two doped belief-propagation decoding of k coded symbols with IS degree distribution. 
The LT overhead bound is the analytical bound by Luby [?]. The IS doping bound is the analytical bound based on the algorithm 
given in Figure [76l 
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Fig. 10. Doping percentage with initial IS code symbol degree distribution vs RS. Both mean and variance are much smaller for 
Ideal Soliton. 
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Fig. 11. Doping percentage as a function of supersquad size when code symbols are linear combinations of degree-two packets: 
for a fixed number of upfront collected symbols ks = 1000, encoded by degree-two IS method, the squad size (node density) is 
changed, so that the supersquad contains 1,2,5, and 10 squads. The more squads there are, the more intense is the data mixing, 
decreasing the probability of non-covered original symbols. 



Ideal Soliton with degree-two input symbols 

22 1 1 1 1 1 1 I — 
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n: number of symbols to decode 



Fig. 12. The encoding process emulates supersquads with fixed squad size h = 200 and the degree-two input symbols overheard 
within the superquad: the resulting doping percentage for IS degree distribution of stored code symbols. 
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5=k /k-1 (in percents) 



Fig. 13. Doping percentage for different values of S = ks/k — 1. Emulation results are obtained based on our analytical model 
and algorithm in Figure [T6l 
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Fig. 14. Collection delay (hop count) above minimum per input symbol for different values of coverage redundancy h as a function 
of S. Note that there is an optimal S for each h in which the delay is minimized: for h = 10 5 is one percent, for /i = 15 it is 3% 
percent, for ft = 30 5 = 4% 
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number of source nodes/number of network nodes 



Fig. 15. Collection Delay for various collection techniques, normalized with respect to the polling cost, as a function of 1/h. 
Note that the proposed doping strategy is inferior to polling only when there are no other nodes but relays. For very large squads 
(> 1000), the proposed doped IS code induces a sufficiently large polling cost (usually to start the process as the IS sample is likely 
not to have degree-one symbols) which offsets (and exceeds) the cost due to overhead packets solicited from the supersquad with 
the RS-based strategy without doping. The coupon collection (non-coding) strategy is consistently worse by an order of magnitude 
than the RS-based fountain encoding and is worse than polling for high source densities (small squads with tens of nodes). 



Initialization: 

k = Q,D = Q 
For {i = l,D <k,i + +) 
Calculate A'"^) {U) 

Using (O, calculate Pr {Y^ ^ t} fov t < k - k 
Using (O, calculate E [Yi\ 
D = D + E[Y,] 

kd = hPd = lOOfcd/fc 



Fig. 16. Calculation of the expected doping percentage pd based on the number of upfront collected symbols 



